Using Salient Words to Perform Categorization of Web Sites
نویسندگان
چکیده
In this paper we focus on web sites categorization. We compare some quantitative characteristics of existing web directories, analyze the vocabulary used in descriptions of the web sites in Yahoo web directory and propose an approach to automatically categorize web sites. Our approach is based on the novel concept of salient words. Two realizations of the proposed concept are experimentally evaluated. The former uses words typical for just one category, while the latter uses words typical for several categories. Results show that there is a limitation of using single vocabulary based method to properly categorize highly heterogeneous spaces as the World Wide Web.
منابع مشابه
The Effects of the Meaningfulness of Salient Brand and Product- Related Text and Graphcis on Web Site Recognition
Building on the associative strength of memory theory and previous studies on the effects of brand name suggestiveness on advertising effectiveness, two salient elements in a business web page, pictures (such as logos or graphics) and words (such as brand or product names), were examined in three experiments. Web sites where salient pictures and words had business meaning suggestive of brand or...
متن کاملHarnessing the Expertise of 70, 000 Human Editors: Knowledge-Based Feature Generation for Text Categorization
Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowle...
متن کاملPositioning of Industries in Cyberspace Evaluation of Web Sites Using Correspondence Analysis
In today’s extremely competitive markets it is crucial for companies to strategically position their brands, products and services relative to their competitors. With the emerging trend in internationalization of companies especially SME’s and the growing use of the Internet with this regard, great amount of attention has been turned to effective involvement of the Internet channel in the mar...
متن کاملText Categorization of Commercial Web Pages
In this paper we describe a new on-line document categorization strategy that can be integrated within Web applications. A salient aspect is the use of neural learning in both representation and classification tasks. Within text documents conceived as images, the regions of interest (RoI) containing information meaningful for categorization are identified with the support of a supervised neural...
متن کاملThe Influence of the Meaning of Pictures and Words on Web Page Recognition Performance
Firms spend high sums trying to make their “home” page as memorable as possible to attract repeat visits. For this purpose, fancy pictures and words are used to catch the attention of visitors. Interestingly, the effectiveness of all of this effort is nearly completely unknown. This study investigated how picture and word selections affected the recognition success rates of the sites visited by...
متن کامل